We will be looking at visualizations and analysis of data. We will be mapping the data to look for patterns as well as running some RDAs. The data in this project was obtained from the Northwest Fisheries Science Center (NWFSC). NOAA’s public access data inventory is a good place to look if you are wanting your own data but you will have to look by location. I choose to look at project’s within the Puget Sound area due to Seattle being my home base. The portion of the data my project will use is the Microbe and Nutrients data. The data was provided by Linda Rhodes who was studying the health of the pelagic food web in the Puget Sound in April-October of 2011.
The full data set and metadata: https://www.webapps.nwfsc.noaa.gov/apex/parrdata/inventory/tables/table/epa2011_physical_nutrients_microbes
Note Prior to uploading the CSV due to 52 columns of data and all headings being untidily labeled it was easier to remove in the CSV file than it was to tidy but tidying will also be shown below.
Data Definitions
Nutrients (Micromolar):
Nitrite: NO2
Nitrate: NO3
Ammonium: NH4
Phosphate: PO4
Silicate: SiOH4
Bacteria (cells/mL):
High Nucleic Acid Bacteria
Low Nucleic Acid Bacteria
Total Bacteria
Hypothesis: Nitrogen based nutrients will have a higher impact on bacteria.
1.Set Working Directory
ex: setwd(“~/MARE375/MARE375Final/MARE375Final”) or go to Session -> Set Working Directory
2. Import Data
Install all necessary packages by install.packages(packagename) there are many through this
example: install.packages(readr)
library(readr)
mic <- read_csv("EPA2011Microbial_Nutrient.csv")
## Rows: 463 Columns: 15── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (15): ID, lat, long, temp, NO3, NO2, NH4, PO4, SiOH4, total_bacteria, hi...
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
##View(mic)
You can also call your data by clicking your data set in Files -> Import Dataset… -> Import
You may have seen there is a code box above the Import button. You can grab the code from this area and put it in your mark down as I did above.
3.Looking over our data using ‘View(mic)’:
There are some missing values. There was no indication in the metadata why there were blanks. There are other areas that provide a 0 for their data so this bacteria may not be present in those areas and a 0 was just forgotten. In our learning purposes we can either put 0’s in, perform imputation, or remove the data. Here we are going to put 0’s.
mic[is.na(mic)] = 0
##View(mic)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ stringr 1.4.0
## ✔ tidyr 1.2.0 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
mic1 <- mic %>% select ( -lat, -long, -ID, -temp)
##View(mic1)
source("panelutils.R") ## this source is needed to run the bivariate plot, it's not a package
pairs(mic1, panel = panel.smooth, diag.panel = panel.hist, main = "Bivariate Plot with Histograms and Smooth Curves", family= "Times")
It doesn’t look like there were any strong linear relationship between nutrients and microbes in general. We would expect there to be some significance between bacteria and nutrients due to their role in the food web in breaking down organic matter and releasing nutrients back into the environment. We should continue to look into this area for further analysis.
Split up our data in categories we want to look at:
Nutrients = nut
Bacteria = bac
Site Location = site
nut <- mic %>% select (NO2, NO3, NH4, PO4, SiOH4)
##View(nut)
bac <- mic %>% select (total_bacteria, low_nucleic_acid_bacteria, high_nucleic_acid_bacteria)
##View(bac)
site <- mic %>% select( "long", "lat")
##View(site)
Orienting yourself to your site on a map is a good visual starter. If you upload your latitude and longitude into Tableau you can get a map with your own sites labeled. You can choose from a variety of backgrounds such as gray and white or satellite as seen here.
Note: If you run into problems with Tableau causing issues with automatically using (SUM)Latitude and (SUM)Longitude instead of individual points turn off Aggregate Measures in the Analysis menu.
We can number the sites to keep track of them area. On this map the longitudinal scale is exaggerated which will help us see the numbers better but the pattern is still the same.
plot(site, type = "n", main = "Site Locations Numbered", xlab = "Longitude", ylab = "Latitude" ,family = "Times")
text(site, row.names(site), cex = 0.8, col = "darkslateblue" , family = "Times")
text(-123.0, 48.5, "Puget Sound , WA", cex = 1.2, col = "Red") #adding labels to identify areas due to longitudinal exaggeration
text(-122.72, 48.74, "Bellingham Area", cex = 0.8, col = "Black")
text(-122.27, 48.2, "Everett Area", cex = 0.8, col = "Black")
text(-122.27, 47.59, "Seattle Area", cex = 0.8, col = "Black")
text(-122.52, 47.7, "Bremerton Area", cex = 0.8, col = "Black")
text(-122.34, 47.29, "Tacoma Area", cex = 0.8, col = "Black")
text(-122.9, 47.29, "Olympia Area", cex = 0.8, col = "Black")
text(-123.09, 47.27, "Shelton Area", cex = 0.8, col = "Black")
text(-123.0, 48.0, "Strait of Juan de Fuca", cex = 0.8, col = "Black") #this is your large open blue area unlabeled on Tableau Map
Plotting the nutrient data on the map with the latitude and longitude shows us the concentrations of each throughout the stations
par(mfrow=c(1,2))
plot(site, asp = 1, main = "NO3 Concentrations by Site", pch = 21, col = "black", bg = "cornflowerblue", cex = 5*mic1$NO3/max(mic1$NO3),xlab = "Longitude", ylab = "Latitude", family = "Times")
##family = "Times" means the font is written in Times New Roman, this is far better than base fonts for this graph
plot(site, asp = 1, main = "NO2 Concentrations by Site", pch = 21, col = "black", bg = "coral3", cex = 5*mic1$NO2/max(mic1$NO2), xlab = "Longitude", ylab = "Latitude", family = "Times")
plot(site, asp = 1, main = "NH4 Concentrations by Site", pch = 21, col = "black", bg = "darkolivegreen3", cex = 5*mic1$NH4/max(mic1$NH4), xlab = "Longitude", ylab = "Latitude", family = "Times")
plot(site, asp = 1, main = "PO4 Concentrations by Site", pch = 21, col = "black", bg = "orchid", cex = 5*mic1$PO4/max(mic1$PO4), xlab = "Longitude", ylab = "Latitude", family = "Times")
plot(site, asp = 1, main = "SiOH4 Concentrations by Site", pch = 21, col = "black", bg = "slateblue", cex = 5*mic1$SiOH4/max(mic1$SiOH4), xlab = "Longitude", ylab = "Latitude", family = "Times")
Looking for patterns in Nutrients at Sites:
There are some larger concentrations of NH~4 in the Olympic and Bremerton Area and severely lacking concentrations in the inlet leading toward Shelton.
There are also some high concentrations of PO4 from Everett toward Bellingham.
NO~3 , NO~2 , and NH~4 don’t appear to have anything super significant popping out.
par(mfrow=c(1,2))
plot(site, asp = 1, main = "Low Nucleic Acid Bacteria", pch = 21, col = "black", bg = "cyan4", cex = 5*mic1$low_nucleic_acid_bacteria/max(mic1$low_nucleic_acid_bacteria), xlab = "Longitude", ylab = "Latitude", family = "Times")
plot(site, asp = 1, main = "High Nucleic Acid Bacteria", pch = 21, col = "black", bg = "darkorange", cex = 5*mic1$high_nucleic_acid_bacteria/max(mic1$high_nucleic_acid_bacteria), xlab = "Longitude", ylab = "Latitude", family = "Times")
plot(site, asp = 1, main = "Total Bacteria By Site", pch = 21, col = "black", bg = "darkorchid4", cex = 5*mic1$total_bacteria/max(mic1$total_bacteria),xlab = "Longitude", ylab = "Latitude", family = "Times")
Looking for Patterns in Bacteria and Nutrients at Sites:
All 3 graphs show that the same area of High PO~4 Concentrations are low bacterial concentrations
Low Nucleic Acid Bacteria have a pattern near the Shoreline of in the inlet leading toward Shelton.
High Nucleic Acid Bacteria appear to be concentrated near Olympia.
Use the Hellinger transformation to obtain eigan values for an RDA and try to normalize our data.
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.5-7
bac.hel <- bac.hel <- decostand(bac, "hellinger")
head(bac)
## # A tibble: 6 × 3
## total_bacteria low_nucleic_acid_bacteria high_nucleic_acid_bacteria
## <dbl> <dbl> <dbl>
## 1 488326. 37494. 450832.
## 2 0 0 0
## 3 2123498. 261850. 1861648.
## 4 1987126. 472888. 1514238.
## 5 3173723. 293880. 2879843.
## 6 470499. 42698. 427801.
head(bac.hel)
## total_bacteria low_nucleic_acid_bacteria high_nucleic_acid_bacteria
## 1 0.7071068 0.1959348 0.6794186
## 2 0.0000000 0.0000000 0.0000000
## 3 0.7071068 0.2483048 0.6620761
## 4 0.7071068 0.3449464 0.6172617
## 5 0.7071068 0.2151719 0.6735734
## 6 0.7071068 0.2130139 0.6742589
source("evplot.R")
source("hcoplot.R")
bac.rda <- rda(bac.hel ~ ., nut)
##summary(bac.rda) The summary is hidden due to the large output important results below
R2adj <- RsquareAdj(bac.rda)$adj.r.squared
R2adj
## [1] 0.1952862
What does this mean?
This analysis yielded 3 canonical axes with eigenvalues labelled RDA1 to RDA3. and 3 additional, unconstrained axes for the residuals with eigenvalues labelled PC1 to PC3. The cumulative contribution to the variance obtained by the 3 canonical axes is R= 0.204 and of that RDA1 makes up almost the entirety of the variance at 19.452%. The Adjusted R = 0.1952862 or 19.529% of the data is explained by the 3 canonical axes.
The triplot shown below show that NO3, SiOH4, and NH4 play a significant role in the dispersion of the sites along the first axis.
plot(bac.rda, scaling = 1, main = "Triplot RDA of bac.hel ~ nut - scaling 1", family="Times" )
bac.sc <- scores(bac.rda, choices = 1:2, scaling = 1, display = "sp")
arrows(0, 0, bac.sc[,1], bac.sc[,2], length = 0, lty = 1, col = "red")
##I am for some reason unable to get which bacteria is which, and site numbers but would be crowded regardless
Thank you